Structured Information Retrieval for Web Documents
نویسندگان
چکیده
To overcome the limitations of conventional Web search engines in retrieving Web documents relevant to users' queries, one has to exploit semantic structures embedded in Web documents. We propose a Web Information Retrieval (WebIR) model for Web documents containing semantic elements which are text segments enclosed by special tags. These special tags, known as semantic tags, can either be independently created for individual Web documents, or be standardized for a collection of Web documents sharing common types of semantic elements. The WebIR model supports queries on both intra-document semantic elements and inter-document links, and returns directed graphs as query results. Each directed graph represents a cluster of connected Web documents satisfying the given query. The collection of directed graphs in a WebIR query result is further ranked. In this paper, we describe the WebIR model and an ongoing implementation eeort to realize the model.
منابع مشابه
Retrieval of Legal Documents: Combining Structured and Unstructured Information
Legal information is often accessible via portal web sites. Legal documents typically combine structured and unstructured information, the former being tagged with markup languages such as XML (Extensible Markup Language). Current information retrieval research takes into account the structured information content of documents when computing the relevance ranking. Such an approach is very promi...
متن کاملComparative Study of Search Engine and Semantic Search Engine: A Survey
We all are aware of two letter word named Information Retrieval (IR) which is nothing but a process of retrieving or gathering information from a given document or a file. The concept of Information Retrieval has gained much height for many years because of large collection of information that is available in form of documents on Internet and to arrange and retrieve utilized words from them is ...
متن کاملAggregative Approximations for Information Retrieval in Semi-Structured Documents
Today’s Web is huge in size, heterogeneous in both contents and data’ structure and is mainly accessed through syntactic and/or statistical criteria. Often, the user is brought to make several searches and to investigate tens of documents to find the information which interests him. The semantic Web was introduced to provide ”meanings” to the information exchanged on the Web and ensure that sof...
متن کاملSemantic Web Search Model for Information Retrieval of the Semantic Data
In this paper, we propose the ontology-based semantic web search model to enhance efficiency and accuracy of information retrieval for unstructured and semi-structured documents. New evaluation model is also proposed to measure the similarity between documents with semantic information. It is implemented and compared with the existing web models.
متن کاملKnowledge Retrieval and the Word Wide Web
Large-scale search engines for the WWW retrieve entire documents effectively. However, they can be considered imprecise because they do not exploit and hence retrieve the semantic content of Web documents. Such content cannot yet be automatically extracted from general documents. Manually structuring Web documents, e.g. via mark-up languages such as XML1, allows more precise information to be r...
متن کاملA novel algorithm for enhancing search results by detecting dissimilar patterns based on correlation method
The dynamic collection and voluminous growth of information on the web poses great challenges for retrieving relevant information. Though most of the researchers focused their research work in the areas of information retrieval and web mining, still their focus is only on retrieving similar patterns leaving dissimilar patterns which are likely to contain the outlying data. So this paper concent...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007